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1  Introduction 


Model  comparison  by  hypothesis  testing  is  a  tool  in  inverse  or  parameter  estimation  problems  to  distinguish 
mathematically  relevant  features  in  a  biological  or  physical  system  under  study.  Consider  that  one  has  some 
dynamical  system  which  describes  a  biological  process 

^  =  g(x,y(x;q),q)  (1) 

2/(0)  =  2/o- 

Given  a  set  of  observations  of  some  component  (s)  of  the  system  under  study  at  various  instances  in  time,  it  is  a 
standard  inverse  problem  to  determine  the  value  of  the  model  parameter  q ,  out  of  a  set  Q  of  admissible  parameters, 
which  best  describes  the  observed  data.  But  an  equally  important  problem  is  to  establish,  on  a  quantitative  basis, 
that  the  biophysical  mechanisms  represented  in  the  mathematical  model  are  necessary  to  describe  the  observed 
data,  and  that  no  extraneous  mechanisms  have  been  included.  This  is  the  role  of  model  comparison  tests. 

For  certain  classes  of  models,  potentially  extraneous  mechanisms  can  be  eliminated  from  the  model  by  a 
simple  restriction  on  the  underlying  parameter  space,  while  the  form  of  the  mathematical  model  (1)  remains 
unchanged.  For  instance,  consider  the  convection-diffusion  model 

du  du  d2u 

- b  V —  =  V - 

dt  dx  dx 2  ’ 

where  T>  is  the  diffusion  coefficient  and  V  is  the  bulk  velocity  of  a  transporting  fluid.  Such  a  model  has  been 
used  to  describe  the  transport  of  labeled  sucrose  in  the  brain  tissue  of  cats  [4] .  Yet  if  the  cerebral  transport  of 
sucrose  is  primarily  driven  by  diffusion,  then  the  convection  term  in  the  model  above  is  extraneous.  Given  data 
involving  the  transport  of  labeled  sucrose,  one  would  like  to  compare  the  accuracy  of  two  mathematical  models. 
In  the  first,  both  convection  and  diffusion  play  a  role  and  one  must  estimate  the  parameters  V  and  T>  in  an 
inverse  problem.  In  the  second,  it  is  assumed  that  convection  is  not  present  (V  =  0),  and  only  the  coefficient  of 
diffusion  T>  is  estimated.  Thus,  in  effect  one  has  two  mathematical  models  which  must  be  compared  in  terms  of 
how  accurately  and  parsimoniously  they  describe  an  available  data  set.  Equivalently,  this  problem  can  be  cast  in 
the  form  of  a  hypothesis  test:  one  would  like  to  test  the  null  hypothesis  V  =  0,  against  the  alternative  hypothesis 
that  V  ^  0.  Under  the  null  hypothesis,  the  restricted  mathematical  model  is  contained  within  the  mathematical 
model  subject  to  the  unconstrained  parameter  space  (in  the  sense  that  estimating  both  V  and  T>  simultaneously 
includes  the  possibility  that  V  =  0).  For  this  reason,  the  unconstrained  model  must  necessarily  fit  the  data  (in  an 
appropriate  sense)  at  least  as  accurately  as  the  constrained  model.  The  role  of  the  hypothesis  test  is  to  compare 
these  two  models  on  a  quantitative  basis  to  investigate  whether  the  model  with  V  ^  0  provides  a  significantly  (in 
some  sense)  better  fit  to  the  data. 

As  the  above  example  makes  clear,  there  is  some  similarity  between  model  comparison  testing  and  hypothesis 
testing  for  a  set  of  mathematical  models  which  have  the  same  mathematical  form  but  differ  in  the  restrictions 
placed  on  the  underlying  parameter  space.  In  such  a  case,  one  may  resort  to  hypothesis  testing  as  a  technique 
for  model  comparison.  In  this  note,  we  are  interested  in  the  situation  where  the  parameter  set  for  the  restricted 
model,  Qh ,  can  be  identified  as  a  linear  subset  of  the  admissible  parameter  set  of  the  unrestricted  model,  Q  (this 
notation  will  be  defined  explicitly  in  Section  3).  In  addition  to  the  study  of  convective  mechanisms  in  models 
of  fluid  transport  in  cat  brains  [4],  hypothesis  tests  have  also  been  used  to  investigate  mortality  and  anemotaxis 
in  models  of  insect  dispersal  [5,  6].  These  examples  are  summarized  (focusing  on  the  role  of  hypothesis  testing) 
in  [2,  8].  More  recently,  model  comparison  by  hypothesis  testing  has  been  conducted  to  investigate  the  division 
dependence  of  death  rates  in  a  model  of  a  dividing  cell  population  [10],  to  evaluate  the  necessity  of  components 
in  viscoelastic  stenosis  models  [26],  and  to  serve  as  the  basis  for  developing  a  methodology  to  determine  if  a 
stenosis  is  present  in  a  vessel  based  on  input  amplitude  [3].  Thus  the  discussions  and  methods  presented  here 
have  widespread  applications  in  biological  modeling  as  well  as  in  more  general  scientific  modeling  efforts. 

While  a  number  of  techniques  for  quantitative  model  comparison  exist,  many  of  these  rely  upon  assumptions 
which  are  highly  restrictive  or  can  be  hard  to  verify  in  practice.  For  instance,  one  can  use  likelihood  ratio  tests 
(e.g.,  [16])  in  order  to  evaluate  the  relative  likelihoods  of  any  pair  of  mathematical  models.  However,  these  tests 
require  the  existence  and  complete  mathematical  specification  of  probability  density  functions  describing  the 
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likelihood  of  a  set  of  observations.  Alternatively,  one  can  use  information  theoretic  model  comparison  tests  such 
as  Akaike’s  Information  Criterion  (AIC)  or  similar  tests  (BIC,  TIC,  etc.)  [14].  Still,  such  tests  typically  require 
a  complete  likelihood  specification.  Though  a  least  squares  analog  of  the  AIC  test  exists,  it  is  only  valid  if  ah 
observations  are  considered  to  be  sampled  from  identical  normal  distributions  with  a  constant  variance  (i.e. ,  i.i.d. 
normal) . 

Unlike  likelihood  estimation,  least  squares  estimation  generally  requires  only  the  first  two  statistical  moments 
of  the  observations  to  be  specified  [13,  Clr.  3].  Additionally,  the  mathematical  formulation  is  intuitive,  and  the 
resulting  estimates  are  more  robust  to  error  misspecihcation  (when  compared  to  likelihood  estimation)  [17,  Ch. 
2].  As  such,  there  is  an  obvious  appeal  for  model  comparison  tests  which  are  easy  to  apply  in  a  least  squares 
framework.  Given  the  above  discussion  regarding  the  relationship  between  model  comparison  and  hypothesis 
testing  for  certain  classes  of  problems,  we  focus  our  attention  on  model  comparison  in  a  least  squares  framework. 

Hypothesis  testing  for  least  squares  problems  was  considered  at  length  by  Gallant  [19].  Subsequently,  these 
results  were  reconsidered  by  Banks  and  Fitzpatrick  [2]  under  a  slightly  different  set  of  assumptions  to  develop  a 
hypothesis  test  which  could  be  computed  from  the  residual  sum  of  squares  (RSS)  after  fitting  the  models  to  a  data 
set  in  a  least  squares  framework.  Unfortunately,  this  work  was  limited  to  cases  in  which  the  observations  were 
assumed  to  be  characterized  by  constant  variance.  Both  the  works  of  Gallant  as  well  as  Banks  and  Fitzpatrick 
rely  upon  an  asymptotic  theory  for  nonlinear  least  squares  to  demonstrate  their  results. 

In  this  presentation  we  begin  by  defining  a  general  least  squares  estimation  problem.  Next,  we  review 
features  of  the  asymptotic  theory  relevant  to  model  comparison  by  hypothesis  testing.  Finally,  we  further  extend 
the  results  of  Banks  and  Fitzpatrick  (in  a  manner  suggested  possible  by  Gallant)  to  a  more  general  variance 
model.  An  example  involving  cell  proliferation  models  with  nonconstant  variance  in  the  experimental  data  is 
used  to  illustrate  application  of  the  new  model  comparison  techniques. 


2  Least  Squares  Estimation 

We  begin  with  the  dynamical  system  (1)  above,  which  describes  some  hypothesized  mathematical  relationship 
between  an  observed  process  ye!1,  an  independent  variable  (covariate)  x  £  Rm,  and  a  set  of  parameters  q  £  Q. 
For  notational  convenience,  we  assume  without  loss  of  generality  that  k  =  1.  Let  f{x\q)  be  the  solution  to  (or 
observation  for)  the  dynamical  system.  Suppose  also  that  observations  of  the  underlying  process  are  noisy  (that 
is,  subject  to  random  error  and/or  fluctuations).  Following  standard  inverse  problem  procedures  [13,  15,  17,  24], 
we  may  relate  the  mathematical  model  to  the  noisy  data  via  a  statistical  model, 


Yk=f(xk-,q*)+£k.  (2) 

Here,  {£k}k=i  are  random  variables  which  account  for  any  error  (measurement  error,  modeling  error,  microfluctu¬ 
ations,  etc.)  between  the  mathematical  model  and  the  recorded  observations  at  values  {xk}k=i  of  the  independent 
variable.  Hence  the  {Y/C}Z=1  are  also  random  variables.  It  is  assumed  here  that  the  values  {xk}  are  known  exactly. 
The  parameter  q*  is  the  hypothetical  ‘true’  parameter  which  is  assumed  to  generate  the  data  for  the  observed 
process.  It  has  been  tacitly  assumed  that  the  underlying  biological  or  physical  system  is  perfectly  represented  by 
the  mathematical  model.  (In  fact,  any  mathematical  model  is  only  an  approximation  to  reality.  For  the  sake  of 
argument,  one  can  assume  that  the  mathematical  model  is  sufficiently  accurate  so  that  any  modeling  error  is  sub¬ 
sumed  within  the  error  random  variables  £k  ■  One  can  relax  this  assumption,  though  doing  so  greatly  complicates 
[18,  19]  the  analysis  presented  below.)  It  is  further  assumed  that  the  error  random  variables  are  independent, 
though  not  necessarily  identically  distributed,  with  central  moments 

E[£k]  =0,  1  <  k  <  n, 

V ar{£k)  =  a2w(xk)2,  1  <  k  <  n. 


With  the  assumptions  and  definitions  above,  we  may  define  the  weighted  least  squares  cost  function, 


t  WLS 
^  n 


(<?) 


f_  /  Yfc  ~  / {xk\  q)  \ 

n  V  w(xk)  ) 


(3) 
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The  weighted  least  squares  estimator  is  defined  to  be  the  random  variable  q^LS  which  minimizes  the  weighted 
least  squares  cost  function  for  a  given  set  of  random  variables  {Yk},  that  is, 


q™LS  =  arg  min  J™LS{q), 

q&Qad 


(4) 


where  Qad  C  Q  is  the  set  of  admissible  values  for  the  parameter  q.  We  remark  that  q^LS  is  a  random  variable 
which  is  dependent  upon  the  random  variables  {Yk}  (and  hence  the  error  random  variables  {£&})■  This  dependence 
is  suppressed  in  the  notation  above  (with  the  exception  of  the  subscript  n)  for  simplicity  and  clarity,  although 
we  can  write  q{{LS  =  q{{LS  ({Yk})  when  necessary.  In  the  event  w{xk)  =  1  for  all  1  <  k  <  n,  the  formulas  above 
reduce  down  to  the  familiar  ordinary  least  squares  formulae 


1  n 

(5) 

71  k= 1 

<lnLS  =  arg  nrin  J°LS(q).  (6) 

q&Qad 


Finally,  we  remark  that  it  is  often  convenient  to  use  a  vectorized  notation  for  the  least  squares  problems.  Define 
f(q)  =  {f{xi',q), . . . ,  f(xn;q))T.  Then  the  statistical  model  (2)  may  be  rewritten  as 

Y  =  f(q)+£  (7) 


where 


E[£\  =  0 

Cov(£ )  =  a2  diag(w(x\)2 , . . . ,  w{xn)2) 

=  a2W. 

Then  clearly  we  can  rewrite 

JnLS{q )  =  l  (y  -  /(g))  T  W-1  (y  -  f(qj)  (8) 

JnLS{q)  =  \{y-  f{q))T  (y  -  f{q] ))  .  (9) 

The  goal  of  an  inverse  or  parameter  estimation  problem  is  to  use  a  set  of  observations  {yk}  (realizations  of  the 
random  variables  {Tfc})  in  order  to  determine  an  estimate  of  the  true  underlying  parameter  q*.  Intuitively,  since 
E[Yk]  =  f{xk',q*)  for  all  1  <  k  <  n,  we  would  expect  the  least  squares  estimator  (either  q^LS  or  q°LS )  to  be 
‘close’  to  q*  in  an  appropriate  sense.  In  practice,  one  uses  the  realized  data  {yk}  to  determine  an  estimate  (ffLS 
or  q^LS  which  is  a  realization  of  the  appropriate  least  squares  estimator.  In  this  paper,  we  are  less  interested  in 
the  computational  issues  associated  with  finding  an  estimate  and  more  interested  in  characterizing  the  statistical 
properties  of  the  estimator  itself.  Since  the  estimator  is  a  random  variable,  it  is  described  not  by  a  numerical 
value  but  rather  by  a  probability  distribution,  and  we  may  use  results  from  asymptotic  theory  to  characterize  the 
distribution  of  the  estimator. 

Going  forward,  for  notational  simplicity  the  dependence  of  the  estimators  and  cost  functions  on  n  may  be 
suppressed  (e.g.,  using  JOLS{q)  instead  of  J°LS(q)).  This  is  the  case  particularly  during  the  discussions  of  the 
cell  proliferation  example  in  Section  5. 


3  Asymptotic  Theory  and  Model  Comparison  for  Ordinary  Least 
Squares 

We  now  review  some  existing  results  on  the  asymptotic  properties  of  nonlinear  ordinary  least  squares  estimators, 
particularly  those  relevant  to  hypothesis  testing.  The  notation  and  organization  of  this  section  closely  follows  that 
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of  Banks  and  Fitzpatrick  [2].  As  mentioned  previously,  the  work  of  Banks  and  Fitzpatrick  was  partially  inspired 
by  the  work  of  Gallant  [19],  and  many  of  their  results  and  assumptions  are  similar.  In  the  discussion  that  follows, 
we  comment  on  any  differences  between  the  two  approaches  that  we  feel  are  noteworthy.  While  Gallant’s  work 
is  remarkably  general,  allowing  for  a  misspecified  model  and  a  general  estimation  procedure  (both  least  mean 
distance  estimators  and  method  of  moments  estimators  are  included),  we  do  not  consider  such  generalities  here. 
The  comments  below  are  limited  to  least  squares  estimation  with  a  correctly  specified  model  (recall  the  discussion 
following  Equation  (2)  above). 

In  this  section  we  follow  [2]  in  focusing  exclusively  on  asymptotic  properties  and  model  comparison  for 
i.i.d.  (absolute)  error  models.  The  theoretical  results  of  Gallant  similarly  focus  on  i.i.d.  errors,  though  some 
mathematical  tools  are  discussed  which  help  to  address  more  general  error  models  [19,  Ch.  2;  pgs.  156-157].  In 
fact,  these  tools  are  used  in  a  rigorous  fashion  in  the  next  Section  to  extend  the  results  of  [2]. 

We  begin  by  establishing  some  notation  and  precisely  defining  the  hypothesis  testing  problem.  It  is  assumed 
that  the  error  random  variables  are  defined  on  some  probability  triple  (Cl,  T ,  P)  and  take  their  values  in  a  space  E. 
By  construction,  it  follows  that  the  data  {Yk}  as  well  as  the  estimators  q^LS  and  q®LS  are  also  random  variables 
defined  on  this  probability  triple,  and  hence  are  functions  of  oj  G  Cl  so  that  we  may  write  Yk(u>),  q°LS(w),  etc., 
as  necessary. 

As  in  the  discussion  of  Section  2,  we  consider  parameters  q  £  Qad  C  <5  for  a  topological  space  Q.  Given  the 
model  f(x;q)  defined  above,  it  is  assumed  that  the  sampled  values  of  the  independent  variables  satisfy  Xk  €  X 
for  all  k,  and  for  some  metric  space  X.  Moreover,  if  the  set  X  is  totally  ordered  (see  Assumption  (A3)  below), 
then  one  can  define  the  empirical  distribution  function, 

1  n 

Vn(x)  =  ~  V  XXk(x) 

n  z ' 
k= 1 

where 

A  (x)  =  f  °’  x~Xk 

Xk  ’  |  1,  otherwise 

Clearly,  /x„  €  V{X),  the  space  of  probability  measures  on  X. 

Finally,  we  return  to  the  original  goal  of  model  comparison  by  hypothesis  testing.  For  now,  we  use  the  notation 
of  ordinary  least  squares  estimation,  in  keeping  with  the  results  presented  in  [2] .  We  have  already  defined  above 
the  standard  ordinary  least  squares  estimator 

q°LS  =  arg  min  J°LS(q). 

qeQad 

As  alluded  to  in  the  introduction,  we  might  also  consider  a  restricted  version  of  the  mathematical  model  in  which 
the  unknown  true  parameter  is  assumed  to  lie  in  a  subset  Qh  C  Qad.  C  Q  of  the  admissible  parameter  space.  We 
assume  this  restriction  can  be  written  as  a  linear  constraint,  Hq*  =  h,  where  H  is  a  known  linear  function  and  h 
is  a  known  vector.  Thus  the  restricted  parameter  space  is 

Qh  { q  c  Qad  •  E q  —  h} . 

Then  the  null  and  alternative  hypotheses  are 

H0:q*e  Qh 
HA:q*  Qh- 


We  may  define  the  restricted  estimator 


Q°,hS  =  arg  min  J%LS{q),  (10) 

q€QH 

and  the  corresponding  estimate  q°^f .  It  is  clear  that  J°LS  (q°LS)  <  J°LS (<?^ijS),  since  Qh  C  Qad ■  This  fact 
forms  the  basis  for  a  model  comparison  test  based  upon  the  residual  sum  of  squares. 
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With  these  definitions,  attention  can  now  be  directed  toward  establishing  a  quantitative  basis  upon  which 
the  null  hypothesis  H0  should  either  be  accepted  or  rejected.  Again,  the  results  presented  below  are  paraphrased 
from  [2],  and  comments  have  been  included  to  indicate  the  alternative  approach  due  to  [19].  No  proofs  are  given 
here,  though  the  interested  reader  can  find  a  complete  set  of  proofs  in  [2]  and  [19].  First,  we  consider  the  following 
set  of  assumptions: 


(Al) 

(A2) 

(A3) 

(A4) 

(A5) 


(A6) 


(A7) 


(A8) 


The  error  random  variables  {£k}  are  independent  and  identically  distributed  random  variables  with  distri¬ 
bution  function  P(£).  Moreover,  E[£\  =  0  and  Cov(£)  =  cr2In  where  In  is  the  n  x  n  identity  matrix. 

<5  is  a  separable,  finite-dimensional  topological  space  (i.e.,  Q  C  Rp),  and  Qad  is  a  compact  subspace  of  Q 
with  q*  e  int(Qad)- 

A  is  a  compact  subset  of  Rm . 

The  function  /(•;•)  €  C2(Q,  C(X)). 

There  exists  a  finite  measure  /i  on  X  such  that 

~y2g(xk)=  f  g(x)dnn(x)  — t  [  g{x)dg( x) 
n  k= i  Jx  Jx 

for  all  continuous  functions  g.  That  is,  the  probability  distributions  gn  converge  weakly  to  /./,  or  fin  converges 
to  ji  in  the  weak*  topology  (when  P(X)  is  viewed  as  a  subset  of  C*(X),  the  dual  of  the  space  of  continuous 
functions). 

The  functional 

J*(q)=a2+  [  (f(x;  q*)  —  f(x;  q))2  dg(x) 

Jx 

has  a  unique  minimizer  at  q*  £  Qad- 
The  matrix 


J  = 


d2J*(q*) 


dq2 


=  2 

=  2 


df(x;q*)df(x;q*)T  ,  d2f(x-,q*) 


x  \  dq  dq 

df(x-iq*)df(x;q*)T 


+ 


dq2 


(f(x;q*)  -  f(x;  q*))  dg{x) 


ix 


dq  dq 


dg{  x) 


is  positive  definite. 

The  matrix  H,  of  dimension  r  x  p,  has  full  rank  r. 


The  most  notable  difference  between  the  assumptions  above  (which  are  those  of  [2])  and  those  of  [19]  is 
assumption  (A5).  In  its  place,  Gallant  states  the  following. 

(A5')  Define  the  probability  measure 

v{S)=  [  [  Is(£,x)dP(£)dg(x) 

Jx  J E 

for  a  set  S  C  E  x  X  and  g  defined  as  above.  Then  almost  every  realized  pair  (ek,Xk)  is  a  Cesaro  sum 
generator  with  respect  to  v  and  a  dominating  function  b(£,x)  satisfying  fXxEbdi>  <  oo.  That  is, 


lim  — 

n— >-oo  fi 


^2g(ek,xk)  = 


k= 1 


X  J  E 


g(£,  x)du(£:  x) 
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almost  always,  for  all  continuous  functions  g  such  that  \g(£,x)\  <  b(£,x).  Moreover,  it  is  assumed  that  for 
each  x  £  X,  there  exists  a  neighborhood  Nx  such  that 

/  sup  b{£,x)dP(£)  <  oo. 

Je  nx 

The  assumption  (A5')  is  stronger  than  the  assumption  (A5)  as  it  supposes  not  only  the  existence  of  a 
dominating  function,  but  also  involves  the  behavior  of  the  probability  distribution  P,  which  is  generally  unknown 
in  practice.  The  practical  importance  of  the  dominating  function  b  arises  in  the  proof  of  consistency  for  the 
least  squares  estimator  (see  Theorem  3.1  below).  As  has  been  noted  elsewhere  (see,  e.g.,  [1],  [24,  Ch.  12]),  the 
strong  consistency  of  the  estimator  is  proved  by  arguing  that  J*(q)  is  the  almost  sure  limit  of  JnLS(q)-  Thus, 
if  JnLS(q )  is  ‘close’  to  J*(q)  and  J*(q)  is  uniquely  minimized  by  q*,  it  makes  intuitive  sense  that  q°LS,  which 
minimizes  J°LS(q),  should  be  close  to  q*.  This  task  is  made  difficult  by  the  fact  that  the  null  set  (from  the 
‘almost  sure’  statement)  may  depend  on  the  parameter  q.  In  [2],  the  almost  sure  convergence  of  JnLS{q )  to  J*(q) 
is  demonstrated  constructively;  that  is,  by  building  a  set  A  £  T  (which  does  not  depend  on  q)  with  P(A)  =  1 
such  that  J%LS(q)  J*(q)  for  each  oj  £  A  and  for  each  q  £  (J .  This  construction  relies  upon  the  separability 
of  the  parameter  space  Q  (assumption  (A2))  as  well  as  the  compactness  of  the  space  X  (assumption  (A3)).  The 
alternative  approach  of  Gallant  uses  a  consequence  of  the  Glivenko-Cantelli  theorem  [19,  pg.  158]  to  demonstrate 
a  uniform  (with  respect  to  q)  strong  law  of  large  numbers.  The  proof  relies  upon  the  dominated  convergence 
theorem,  and  hence  the  dominating  function  b.  As  a  result,  Gallant  does  not  need  the  space  Q  to  be  separable 
or  the  space  X  to  be  compact.  It  should  be  noted,  however,  that  in  most  practical  applications  of  interest  Q  and 
X  are  compact  subsets  of  Euclidean  space  so  that  relaxing  these  assumptions  provides  little  advantage. 

The  choice  of  dominating  function  b(£,x)  deserves  further  comment.  Define  the  functional  h  :  E  x  X  ^  I 

by 

h(£,  x)  =  {£  +  f(x,  q*)  -  f(x,  q))2  . 

Then  b(£,x)  is  chosen  so  that  h,  as  well  as  its  derivatives  J^-,  dq.Qq.  and  the  product  terms  (for  all 

combinations  of  1  <  i,j  <  p),  are  dominated  by  b  for  all  q  £  Q  [19,  pg.  256].  The  domination  of  the  function  h  by 
the  z/-integrable  function  b  can  be  used  to  obtain  a  number  of  desirable  results,  including  uniform  convergence  and 
the  interchange  of  differentiation  and  integration  [19,  Lemma  3].  As  a  result,  one  does  not  need  a  positive  definite 
hessian  matrix  J  to  prove  the  asymptotic  normality  of  scores  [19,  Thm.  4]  (contra.  [2,  Thm.  4.1]).  Rather,  one 
only  needs  J  to  be  nonsingular  in  order  for  the  conclusion  of  Theorem  3.2  below  to  be  well-defined. 

While  the  list  of  assumptions  above  is  extensive,  we  remark  that  the  set  is  not  overly  restrictive.  Assumptions 
(A2)  and  (A3)  are  naturally  satisfied  for  most  problem  formulations  (although  the  requirement  q*  £  int(Qad )  may 
be  occasionally  problematic  [2,  Remark  4.4]).  Assumptions  (A4)  and  (A8)  are  easily  checked.  Though  assumption 
(Al)  may  be  difficult  to  verify,  it  is  much  less  restrictive  than,  say,  a  complete  likelihood  specification.  Moreover, 
residual  plots  [13,  Ch.  3]  can  aid  in  assessing  the  reliability  of  the  assumption. 

Assumption  (A5)  is  more  difficult  to  check  in  practice  as  one  does  not  know  the  limiting  distribution  fi. 
Of  course,  this  is  simply  an  assumption  regarding  the  manner  in  which  data  is  sampled  (in  the  independent 
variable  space  X)-namely,  that  it  must  be  taken  in  a  way  that  ‘fills  up  the  space’  in  an  appropriate  sense. 
Similarly,  assumptions  (A6)  and  (A7)  cannot  be  verified  directly  as  one  knows  neither  g  nor  q*.  In  many 
practical  applications  of  interest,  /j  is  Lebesgue  measure,  and  one  can  assess  the  assumptions  at  q°LS  (which  is 
hopefully  close  to  q*).  Of  course,  if  assumption  (A7)  holds,  then  assumption  (A6)  must  hold  at  least  for  a  small 
region  around  q* ,  though  possibly  not  on  all  of  Qad- 

As  noted  above,  assumption  (A7)  is  not  strictly  necessary  if  one  uses  assumption  (A5')  in  the  place  of  (A5). 
Given  assumptions  (A2)-(A4),  it  follows  that  the  function  h  (and  its  relevant  derivatives;  see  above)  is  bounded 
(and  hence  dominated  by  a  immeasurable  function)  provided  the  space  E  in  which  the  random  variables  £k  take 
their  values  is  bounded.  On  one  hand,  this  has  the  desirable  effect  of  weakening  the  assumptions  placed  on  the 
Hessian  matrix  J .  Yet  the  assumption  that  E  is  bounded  precludes  certain  error  models,  in  particular  normally 
distributed  errors. 

Finally,  we  remark  that  while  we  only  consider  the  linear  restriction  Hq*  =  h,  it  is  possible  to  extend  the 
results  above  to  include  nonlinear  restrictions  of  the  form  H{q*)  =  h  [19,  pg.  47ffj.  In  such  a  case,  one  is  interested 


7 


in  the  restricted  parameter  space 


Qh  =  \  q  G  Qad  '■  H{q)  =  hj  . 

Assuming  the  null  hypothesis  is  true,  one  can  construct  the  linearization 

i(9)  =  A(?*)  +  ^h(5-9*) 
=  U^(9-9‘). 


dq 


Then 


Qh  ~  Q  G  Qad  •  L(cl)  h  j" . 


But  the  condition  L(q)  =  h  is  equivalent  to  the  condition  that 


dH(q*)  dH(q*)  * 

— o ^  =  — o - 9 

oq  oq 


Thus,  under  the  null  hypothesis,  the  nonlinear  restriction  H(q*)  =  h  is  asymptotically  locally  equivalent  to  the 
linear  restriction  of  assumption  (A8)  with  H  =  c)H^  ^  and  h  =  dHJj ^  ^  q* .  As  such,  we  only  consider  the  problem 
of  testing  a  linear  hypothesis.  For  nonlinear  hypotheses,  the  results  presented  here  are  accurate  to  the  order  of 
the  linear  approximations  above. 

We  now  give  several  results  which  summarize  the  asymptotic  properties  of  the  ordinary  least  squares  estimator. 
Theorem  3.1.  Given  assumptions  (A1)-(A6),  q°LS  — >  q*  with  probability  one.  That  is, 


P 


( jw  g  n 


, OLS( 


<ln~~  M 


'}) 


=  1. 


This  theorem  states  that  the  ordinary  least  squares  estimator  is  consistent.  We  remark  that  the  finite  dimen¬ 
sionality  of  the  parameter  space  Q  (see  assumption  (A2))  is  not  necessary  in  the  proof  of  this  theorem,  and  it  is 
sufficient  for  the  function  /  to  be  continuous  from  Q  into  C(X)  rather  than  having  two  continuous  derivatives. 
Given  Theorem  3.1,  the  following  theorems  may  also  be  proven. 

Theorem  3.2.  Given  assumptions  (A1)-(A7),  as  n  — >  oo 


V^(q%LS-q*)^Ar(0,2a2J-'), 

where  the  convergence  is  in  distribution,  and  occurs  with  probability  one. 

Theorem  3.3.  Given  assumptions  (A1)-(A8)  and  assuming  Hq  is  true,  then  as  n  oo 


V^KLHs-£LS)^*r(o 

where 

V  =  2a2HT{HJ-1HT)~lH. 

Again,  the  convergence  occurs  with  probability  one. 

In  effect,  Theorem  3.3  states  that  the  restricted  and  unrestricted  OLS  estimators  will  converge  to  one  another 
provided  Hq  is  true.  Now  we  can  turn  to  the  main  result  of  interest.  Define 

rrOLS  n{jOLS{qOLHS)_jOLS{qOLS)) 

J°LS{Q°LS) 


Theorem  3.4.  Under  assumptions  (Al)-(A8)  and  assuming  Hq  is  true,  then  as  n  — >  oo 

U%LS  A  X2(r) 

with  probability  one,  where  X2(r)  is  a  Chi-square  distributed  with  r  degrees  of  freedom. 

Proofs  of  Theorems  3. 1-3.4  can  be  found  in  [2].  We  remark  again  that  very  similar  results  (under  slightly 
differing  assumptions)  can  be  found  in  [19]  where  it  is  shown  that  the  statistic 

(n  -  p)  (J^S(qZLHS)  -  J2LS(tiLS)) 

J°LS(q°LS ) 

is  asymptotically  distributed  as  X2(r)  [19,  pg-  236].  The  two  model  comparison  statistics  U°LS  and  U°LS  are 
asymptotically  equivalent,  the  difference  in  scaling  factors  amounting  to  a  bias  correction  in  the  estimation  of  the 
variance  parameter  a2.  See  [2,  Thm.  4.6]  and  [19,  pg.  236]  for  details. 

The  practical  importance  of  this  theorem  is  that  one  has  a  quantitative  basis  upon  which  to  assess  the 
null  hypothesis.  Because  the  unrestricted  model  contains  the  restricted  model  as  a  special  case,  we  must  have 
Jn.LS  (9«  i?S)  >  JnLS  (<inLS)  an(l  thus  & nLS  >  0  where  u„LS  is  a  realization  of  U°LS .  Heuristically,  U°LS  is  used 
to  indicate  the  relative  likelihood  that  the  results  obtained  from  a  given  realization  of  the  data  are  the  result  of 
random  chance.  A  full  discussion  of  the  use  and  interpretation  of  the  test  statistic  is  given  in  [2,  pg.  518-519] 
and  [13,  Sec.  3.5.1].  Significantly,  the  terms  JnLS (<lnLS)  and  JnLS (Qnji)  used  to  compute  iinLS  are  a  natural 
byproduct  of  the  optimizations  (6)  and  (10)  and  can  be  returned  directly  by  most  optimization  software.  It  should 
also  be  noted  that  the  limiting  distribution  for  U°LS  is  independent  of  the  model  parameterization.  As  such,  the 
results  are  robust  to  any  reparameterization  or  change  of  variables  in  the  mathematical  model  f(x ;  q). 


4  Extension  to  Weighted  Least  Squares 

The  results  presented  above  are  quite  useful,  but  they  only  apply  to  the  class  of  problems  in  which  the  error 
random  variables  are  independent  and  identically  distributed  with  constant  variance  cr2.  While  the  assumption 
of  independence  is  common,  there  are  many  practical  cases  of  interest  in  which  these  random  variables  are  not 
identically  distributed.  As  discussed  in  Section  2,  one  may  encounter  a  weighted  least  squares  problem  in  which 
E[£k\  =  0  for  all  k  but  V  ar{£k)  =  cr2iv(xk)2.  In  such  a  case,  the  results  above  (namely,  assumption  (Al))  fail  to 
apply  directly.  See,  e.g.,  [7,  Sec.  2]  and  references  therein  as  well  as  [13]  for  more  information  on  the  development 
of  error  models. 

In  order  to  extend  the  results  presented  above  to  independent,  heteroscedastic  error  models  (and  hence,  to 
weighted  least  squares  problems),  we  turn  to  a  technique  suggested  by  Gallant  [19,  pg.  124]  in  which  one  defines  a 
change  of  variables  in  an  attempt  to  normalize  the  heteroscedasticity  of  the  random  variables.  As  has  been  noted 
previously,  Gallant  used  this  technique  under  a  different  set  of  assumptions  in  order  to  obtain  results  similar  to 
those  presented  above.  This  change  of  variables  technique  will  allow  us  to  extend  the  results  above,  originally 
from  [2],  in  a  rigorous  fashion. 

Consider  the  following  assumptions. 

(Al'a)  The  error  random  variables  {£k}  are  independent  and  have  central  moments  which  satisfy  E[£]  =  0, 
Cov(£)  =  a2W  =  cr2diag(w(xi)2 , . . . ,  w(xn)2). 

(Al'b)  The  distributions  of  the  random  variables  {£k}  are  completely  determined  by  the  first  two  central  moments. 

(Al'c)  The  function  w  :  X  — >  R+  is  continuous. 

(A7')  The  matrix 

j  =  2  [  1  (  d/Qr;<f)  df  (x;  q 

“  Jx  w(x)2  {  dq  dq 

is  positive  definite. 
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Theorem  4.1.  Under  assumptions  (Al'a)-(Al'c),  (A2)-(A6),  (AT),  and  (A8), 

1.  q()'LS  —>  q*  with  probability  one,  and 

2.  (q™LS  ~q*)^U  (o,  2g2J~^  . 

Assuming  further  that  Hq  is  true, 

3 •  -  q”LS)  ^  ^  (o,  j-'Vj-') 

where 

V  =  2a2HT(HJ-1HT)~1H. 

Again,  the  convergence  occurs  with  probability  one.  Finally,  defining 

rrWLS  »  («££*)  ' ~  J™ {tfL8)) 

n  J™LS(q™LS) 

4 .  U™LS  %2(r)  with  probability  one. 

Proof.  We  first  recall  the  statistical  model  (2) 

Yk  =  f{xk\q*)  +£k , 

which  can  be  written  in  vector  form  (7)  as 

Y  =  f(q*)  +  £, 

where  now  E[£\  =  0  and  Cov(£)  =  a2W .  Define  L  =  diag(w(x i), . . . ,  w( xn)).  It  follows  that  LLT  =  L2  =  W  ( L 
is  the  Cholesky  decomposition  of  W).  Also,  L-1  exists  and  can  be  applied  to  both  sides  of  (7)  to  obtain 

L-1?  =  L-1  f(q*)  +  L-1? 
or 

Z  =  h(q*)  +  rf,  (11) 

where  Z,  h  and  ?7have  the  obvious  definitions.  The  OLS  cost  functional  for  the  transformed  model  and  data  (11) 
is 

JnLS(q)  =  l{z-kq))T  (z-Hq)) 

=  i  (l-1?  -  L-1f{q))T  ( L ~lY  -  L-lf(q)) 

=  l{z-  /(?)) T  l^l-1  (y  -  /(«)) 

=  ±  (y  -  f(q))T  W-1  (y  -m). 

In  other  words,  the  ordinary  least  squares  cost  function  with  respect  to  the  transformed  model  and  data  (11) 
is  exactly  the  weighted  least  squares  cost  function  (8)  for  the  original  model  and  data.  Thus,  in  order  to  prove 
Theorem  4.1,  we  will  show  that  the  rescaled  model  (11)  satisfies  the  assumptions  of  Theorems  3.1  -3.4. 

Clearly,  E[ff\  =  0  and  Cov(rf)  =  g2L~1WL~1  =  a2In.  Thus  the  random  variables  r/k  (the  components  of  ff) 
are  independent  and  identically  distributed  (by  assumption  (Al'b))  with  constant  variance,  and  thus  assumption 
(Al)  is  satisfied.  Assumptions  (A2),  (A3),  (A5),  and  (A8)  are  unchanged.  For  assumption  (A4),  we  must  show 
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that  the  function  h{x;q)  =  f{x\q)/w{ x)  £  C2(Q,  C(X)).  This  follows  from  assumption  (Al'c)  and  the  fact  that 
w(x)  does  not  depend  on  q.  For  assumption  (A6),  we  must  show 


J*{q)  =  a2  +  /  (h(x;q*)  -  h(x;q))  d/j,(x) 

Jx 

2 


=  a 


ix 


f{x]q*)  -  /(x;  g) 
w(x) 


df-i(x) 


has  a  unique  minimum  at  q  =  q* .  Clearly,  J*(q*)  =  a2.  Since  the  function  J*(q)  (see  assumption  (A6))  has 
a  unique  minimum  at  q  =  q*  and  w(x)  >  0,  it  follows  immediately  that  J*(q*)  >  <J2  if  q  ^  q*  so  that  J*{q) 
has  a  unique  minimum  at  q  =  q* .  Assumption  (A7)  is  satisfied  for  the  formulation  (11)  directly  by  assumption 

{AT).  '  '  □ 

In  fact,  the  proof  of  Theorem  4.1  applies  to  any  set  of  observations  in  which  a  change  of  variables  can  be 
used  to  produce  a  set  of  error  random  variables  which  are  independent  and  identically  distributed.  The  weighted 
least  squares  problem  addressed  in  the  above  discussion  arises  from  an  observation  process  in  which  the  errors 
are  assumed  to  be  independent  but  are  not  necessarily  identically  distributed.  By  rescaling  the  observations 
in  accordance  with  their  variance  (which  is  assumed  to  be  known)  one  obtains  error  random  variables  which 
are  identically  distributed  as  well  as  independent.  Even  more  generally,  it  is  not  strictly  necessary  that  the 
observations  be  independent.  For  instance,  one  might  have  observations  generated  by  an  autoregressive  process 
of  order  r.  Then,  by  definition,  some  linear  combination  of  r  observational  errors  will  give  rise  to  errors  which 
are  independent  and  identically  distributed.  This  linear  combination  is  exactly  the  change  of  variables  necessary 
to  obtain  a  model  which  is  suitable  for  ordinary  least  squares.  Thus,  even  in  the  most  general  situation,  when 
one  has  Cov{£)  =  R,  one  may  still  use  the  Clrolesky  decomposition  in  the  manner  discussed  above,  provided  one 
has  sufficient  assumptions  regarding  the  underlying  error  process.  See  [19,  Ch.  2]  for  details. 

As  in  Section  3,  the  assumption  (A7')  is  the  most  problematic  to  verify  in  practice.  In  the  proof  above  for 
the  weighted  least  squares  problem,  the  assumption  (Al)  has  been  replaced  with  assumptions  (Al'a)-(Al'c),  a 
change  which  merely  accounts  for  the  heteroscedastic  statistical  model.  Then  the  assumptions  (A2)-(A6)  and 
(A8)  for  the  rescaled  model  (11)  can  be  verified  directly  from  the  original  assumptions  (A2)-(A6)  and  (A8)  for 
the  ordinary  least  squares  formulation,  as  shown  above.  The  only  exception  is  the  assumption  (A7),  which  cannot 
be  established  directly  from  the  ordinary  least  squares  assumptions,  hence  the  need  for  assumption  (A7').  On 
one  hand,  the  existence  of  a  unique  minimum  (assumption  (A6))  is  sufficient  to  prove  that  the  matrix  J  or  J 
must  be  positive  semi-definite,  so  that  the  assumption  (A7)  or  (A7')  may  not  be  overly  restrictive.  Alternatively, 
as  has  been  noted  before,  one  can  relax  the  assumptions  (A7)  or  (A7;)  by  assuming  the  existence  of  a  dominating 
function  b.  Moreover,  provided  the  weights  w(x)  satisfy  the  slightly  stronger  requirement  w(x)  >  5  >  0  for  all 
x  £  X ,  then  one  can  use  the  dominating  function  b  (from  the  ordinary  least  squares  problem)  to  obtain  a  new 
dominating  function  b(£,x)  =  5~1b(£,x)  which  is  also  r'-integrable.  Even  in  this  case,  though,  one  must  still 
make  the  additional  assumption  that  J  is  invertible. 

All  of  the  results  above  also  readily  extend  to  systems  governed  by  partial  differential  equations;  the  essential 
elements  of  the  theory  are  the  form  of  the  discrete  observation  operator  for  the  dynamical  system  and  the  statistical 
model  as  given  in  (2). 


5  Example:  Hypothesis  Testing  in  Models  of  Cell  Proliferation 

We  now  consider  a  practical  application  of  model  comparison  for  a  weighted  least  squares  problem.  We  consider  the 
problem  of  modeling  flow  cytometry  data  for  a  dividing  population  of  lymphocytes  labeled  with  the  intracellular 
dye  CFSE.  As  cells  divide,  the  highly  fluorescent  intracellular  CFSE  is  partitioned  evenly  between  two  daughter 
cells.  A  flow  cytometer  measures  the  fluorescence  intensity  of  labeled  cells  as  a  surrogate  for  the  mass  of  CFSE 
within  a  cell,  thus  providing  an  indication  of  the  number  of  times  a  cell  has  divided.  Most  commonly,  the 
data  is  presented  in  histograms  showing  the  distribution  of  CFSE  in  the  measured  population  of  cells  at  each 
measurement  time.  A  sample  data  set  is  shown  in  Figure  1.  See  [12,  25]  and  the  references  therein  for  a  detailed 
overview  of  the  experimental  procedure. 
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Figure  1:  Typical  histogram  data  for  a  CFSE-based  proliferation  assay  showing  cell  count  vs.  log  fluorescence 
intensity;  originally  from  [22]. 


The  mathematical  modeling  problem  is  to  develop  a  system  of  equations  which  accounts  for  the  manner  in 
the  population  distribution  of  CFSE  evolves  as  cells  divide,  die,  and  slowly  lose  fluorescence  intensity  as  a  result 
of  protein  turnover.  In  doing  so,  one  must  account  for  not  only  the  division  and  death  dynamics  of  the  population 
of  cells,  but  also  how  these  processes  relate  to  the  observed  distribution  of  fluorescence  intensities.  Let  hi(t,x) 
represent  the  label-structured  density  (cells  per  unity  of  intensity,  cells/UI)  of  cells  at  time  t  having  completed  i 
divisions  and  having  fluorescence  intensity  x  from  intracellular  CFSE  (that  is,  not  including  the  contribution  of 
cellular  autofluorescence) .  Then  it  can  be  shown  that  this  density  can  be  modeled  in  terms  of  two  independently 
functioning  components,  one  accounting  for  the  division  dynamics  of  the  population  of  cells,  and  one  accounting 
for  the  intracellular  processing  of  CFSE  [20,  23].  It  has  been  shown  that  the  slow  loss  of  fluorescence  intensity  as 
a  result  of  intracellular  protein  turnover  can  be  well-modeled  by  a  Gompertz  growth/decay  process  [9,  25].  It  has 
also  been  shown  that  the  highly  flexible  cyton  model  [21]  can  very  accurately  describe  the  evolving  generation 
structure  (cells  per  number  of  divisions  undergone)  [12].  Thus  we  consider  the  model 


h(t,x)  =  Ni(t)rii(t,x) 

The  functions  fii(t,x )  each  satisfy  the  partial  differential  equation 


dni(t,x ) ktd[xni(t,x)\ 


dt 


dx 


=  0 


(12) 


(13) 


with  initial  condition 


Hi(  0,x) 


2’$(2ix) 

N0 


where  $(#)  is  the  label-structured  density  of  the  cells  at  t  =  0,  and  Nq  is  the  number  of  cells  in  the  population 
at  t  =  0,  all  of  which  are  assumed  to  be  undivided.  The  functions  Ni(t)  are  described  by  the  cyton  model, 


N0(t)  =  N0  -  f  (■ nt(s )  -  <e(s))  ds 
Jo 

Ni(t)  =  f  (2 nt\(s)  -  nfiv(s)  -  n?e(s))  ds , 

Jo 


(14) 
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where  ndlv(t)  and  ndle(t)  indicate  the  numbers  per  unit  time  of  cells  having  undergone  i  divisions  that  divide 
and  die,  respectively,  at  time  t.  In  this  regard,  the  cyton  model  (and  hence,  the  mathematical  model  as  a  whole) 
can  be  considered  as  a  large  class  of  models  differentiated  by  the  mechanisms  one  uses  to  describe  the  terms 
nflv(t )  and  ndle(t),  which  we  consider  in  more  detail  below.  We  remark  that  the  mathematical  model  (12)  can 
be  equivalently  characterized  by  the  system  of  partial  differential  equations 

=  (nd0iv(t)  -  nd0ie(t))  h0 (t,x) 

CC~ktdjW  =  -  «?''(*)  -  «?'(*))  mM) 


dh0 

dt 

dh\ 

~dt 


Given  the  solution  h(t,x)  =  ~}2hi(t,x)  in  terms  of  the  fluorescence  resulting  from  intracellular  CFSE,  one  must 
add  in  the  contribution  of  cellular  autofluorescence  to  obtain  a  population  density  structured  by  total  fluorescence 
intensity,  which  is  the  quantity  measured  by  a  flow  cytometer.  This  density  can  be  computed  as 


n(t,  x ) 


h(t,  x)p{x 


x)dx, 


where  p{xa)  is  a  density  function  describing  the  distribution  of  cellular  autofluorescence  [20]. 

Let  <j>o(t)  and  ipo(t)  be  probability  density  functions  (in  units  1/hr)  for  the  time  to  first  division  and  time  to 
death,  respectively,  for  an  undivided  cell.  Let  To,  the  initial  precursor  fraction,  be  the  fraction  of  undivided  cells 
which  would  hypothetically  divide  in  the  absence  of  any  cell  death.  It  follows  that 

ntv(t)  =  F0N0  J  ipo(s)dsj  4>0 (t) 

nd0ie(t)  =  N0(l-  F0  J  Ms)ds^  MV-  (15) 


Similarly,  one  can  define  probability  density  functions  and  for  times  to  division  and  death,  respectively, 
for  cells  having  undergone  i  divisions  (he  cellular  ‘clock’  begins  at  the  completion  of  the  previous  division),  as  well 
as  the  progressor  fractions  T)  of  cells  which  would  complete  the  ith  division  in  the  absence  of  cell  death.  Then 


nf'(t)  =  2T)  £  r4%{s)  (l-£  S  M*  -  s)da 

nfe{t)  =2  ni-i  (s)  (1~Fi  jQ  MOd^j  ^  s)ds.  (16) 


It  is  assumed  that  the  probability  density  functions  for  times  to  division  are  lognormal,  and  that  undivided  cells 
and  divided  cells  have  two  separate  density  functions.  Similarly,  it  is  assumed  that  divided  cells  die  with  a 
lognormal  probability  density.  Thus 


</>  o(t) 
<!>%(*) 
Mt) 


tadivV^ 

1 

tafivV2n 

1 

tafeV 2tt 


exp 


exp 


exp 


(log 

2  «v)2  ) 
(log  t-pfv)2\ 

2  (v?v)2  ) 

(log  t  M?e)2\ 
2(cr/ie)2  ) 


(i>  1) 

(<>!)■ 


We  treat  cell  death  for  undivided  cells  as  a  special  case.  It  is  assumed  that  the  density  function  describing  cell 
death  for  undivided  cells  is 


V’o  (t) 


Fn 


tadieV2^ 


exp 


(log*  -  Moie)2\ 

2«e)2  ) 


+  (1  Pidle')  (1 


Fo)pe-P\ 


(17) 
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where  Pidie  is  the  fraction  of  cells  which  are  will  neither  die  nor  divide  over  the  course  of  the  experiment.  The 
form  of  the  function  above  arises  from  the  assumption  that  progressing  cells  die  with  a  lognormal  probability 
density.  Nonprogressing  cells  die  at  an  exponential  rate,  except  for  the  fraction  of  idle  cells.  For  the  purposes  of 
this  paper,  we  are  interested  in  testing  the  null  hypothesis  pule  =  0  against  the  alternative  hypothesis  Pidie  7^  0. 
A  more  detailed  discussion  of  the  mathematical  model  can  be  found  in  [11]. 

With  the  mathematical  model  established,  we  now  turn  our  attention  to  a  statistical  model  of  the  data.  Let 
Nl  be  a  random  variable  representing  the  number  of  cells  measured  at  time  tj  with  log-fluorescence  intensity  in  the 
region  [zk,  Zk+ 1)-  Let  q  be  the  vector  of  parameters  of  the  model  (12),  so  that  we  may  rewrite  n(t,  x)  =  n(t,  x;  q). 
Define  the  operator 

rzk+ 1 

I[n](tj,zk;q)  =  /  10z  ln(10)n(i,  10z;  q)dz, 

J  Zk 

which  represents  the  computation  of  cell  counts  from  the  structured  density  n(t,  x)  transformed  to  the  logarithmic 
coordinate  z  =  log10  x.  Then  a  statistical  model  linking  the  random  variables  Nl  to  the  mathematical  model  is 

Nl  =  +  Aj  (l[n](tj ,  zJk;  go))  7  £kj  (18) 

where  the  A j  are  scaling  factors  [25,  Ch.  4]  and  the  random  variables  £kj  satisfy  assumption  (Al).  In  principle,  it 
is  possible  to  use  a  multi-stage  estimation  procedure  to  determine  the  values  of  the  statistical  model  parameters 
A  j  and  7  [17]  but  we  do  not  consider  that  here.  It  is  assumed  that  A  j  =  1  for  all  j. 

In  the  case  that  7  =  0,  one  has  precisely  absolute  error  and  a  statistical  model  suitable  for  the  ordinary  least 
squares  cost  function  (5)  and  estimator  (6)  to  estimate  the  unknown  parameter  q 0.  The  results  of  using  the  OLS 
estimator  can  be  found  in  Figure  2  for  the  null  and  alternative  hypotheses.  The  ordinary  least  squares  cost  of 
fitting  the  model  (12)  under  the  null  hypothesis  is  JOLS  (q^LS)  =  4.0423  x  1011.  The  ordinary  least  squares  cost 
under  the  alternative  hypothesis  is  JOLS (qOLS)  =  3.6228  x  1011  [11].  We  would  like  to  determine  whether  or 
not  the  small  decrease  in  cost  associated  with  the  alternative  hypothesis  is  outweighed  by  the  larger  number  of 
parameters  (17  versus  16)  required.  As  seen  in  Figure  2,  the  fits  of  the  two  models  to  the  data  are  very  similar, 
so  that  one  cannot  justify  a  conclusion  solely  on  the  basis  of  the  slightly  reduced  cost.  Rather,  one  should  use  a 
model  comparison  test. 

However,  as  the  theory  of  Sections  3  and  4  suggests,  one  cannot  rigorously  apply  a  model  comparison  test 
unless  the  underlying  statistical  model  is  accurate.  It  follows  from  the  form  of  Equation  (18)  that  when  7  =  0  the 
residuals  rf.  =  I[n\(tj,zJk;  qOLS)  —  nk  (where  the  nk  are  realizations  of  the  data  random  variables  Nk)  correspond 
to  realizations  of  the  error  terms  £k.  As  such,  these  residuals  should  appear  random  when  plotted  as  a  function 
of  the  magnitude  of  the  model  solution  [13,  Ch.  3].  However,  we  see  in  Figure  3  (top)  that  this  is  not  the  case. 
There  is  a  noticeable  increase  in  the  variance  of  the  residuals  as  the  magnitude  of  the  model  solution  increases. 
Thus  we  must  conclude  that  the  constant  variance  model  7  =  0  is  not  correct,  and  that  the  ordinary  least  squares 
cost  function  is  not  adequate  as  a  basis  for  model  comparison.  A  constant  coefficient  of  variation  (CCV)  model 
(7  =  1,  usually  called  relative  error)  leading  to  a  generalized  least  squares  (GLS)  problem  formulation  was  also 
considered  in  [25].  Again  residual  analysis  suggested  that  this  was  not  the  correct  formulation. 

In  fact,  the  most  appropriate  theoretical  value  of  7  appears  to  be  7  =  1/2  [25]  so  that  the  weighted  least 
squares  cost  function  (3)  and  estimator  (4)  should  be  used.  Because  the  weights  must  be  bounded  away  from 
zero  (see  assumption  (Al'c)),  we  define  the  weights  to  be 


[l[n\{tj,zl-,q*)) 

(J*)1/2 , 


1/2 


I[n](tj,zl-,q*)  >  I* 
,zJk',q*)  <  I* 


The  cutoff  value  I*  >  0  is  determined  by  the  experimenter  so  that  the  resulting  residuals  appear  random.  In  the 
work  that  follows,  I*  =  1  x  104.  Because  wJk  depends  upon  the  unknown  parameter  q* ,  an  iterative  optimization 
scheme  must  be  used  (see  [13,  Ch.  3]  or  [17,  Ch.  2]  for  details).  Using  an  estimate  of  q*  (say,  the  ordinary  least 
squares  estimate  qOLS),  one  computes  the  weights  according  to  the  formula  above  (with  q*  replaced  by  qOLS) 
and  then  uses  these  (fixed)  weights  to  compute  the  weighted  least  squares  estimate  qWLS .  One  can  then  use  this 
updated  estimate  to  compute  new  weights,  and  continue  the  process  iteratively  until  convergence  is  achieved. 
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Figure  2:  Left:  results  of  fitting  the  model  (12)  under  the  null  hypothesis  using  ordinary  least  squares.  Right: 
results  of  fitting  the  model  (12)  under  the  alternative  hypothesis  using  ordinary  least  squares.  Both  fits  to  data 
are  superb,  and  it  is  difficult  to  distinguish  between  the  two  models  from  these  graphics. 
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Figure  3:  Top:  residual  plots  for  the  mathematical  model  under  the  null  (left)  and  alternative  (right)  hypotheses, 
showing  the  residuals  in  terms  of  the  magnitude  of  the  model  solution.  Bottom:  modified  residual  plots  for  the 
mathematical  model  under  the  null  (left)  and  alternative  (right)  hypotheses,  showing  the  modified  residuals  in 
terms  of  the  magnitude  of  the  model  solution.  While  the  variance  of  the  standard  residuals  is  observed  to  grow 
with  the  magnitude  of  the  model  solution,  we  see  that  the  variance  of  the  modified  residuals  is  largely  constant. 


Residual  plots  obtained  using  the  theoretical  value  of  7  =  1/2  are  shown  in  Figure  3  (bottom).  (The  weighted 
least  squares  fit  to  data  is  not  shown,  as  the  results  are  sufficiently  similar  to  the  ordinary  least  squares  fits  of 
Figure  2.)  While  the  variance  of  the  standard  residuals  is  observed  to  increase  with  the  magnitude  of  the  model 
solution,  the  variance  of  the  modified  residuals  is  approximately  constant,  providing  some  evidence  that  the 
statistical  model  (18)  (with  7  =  1/2)  is  correct.  As  such,  we  are  prepared  to  use  the  model  comparison  test 
statistic  to  analyze  the  null  hypothesis. 

Under  the  null  hypothesis,  the  weighted  least  squares  cost  functional  is  JWLS (q^LS)  =  10.3040  x  106.  Under 
the  alternative  hypothesis,  the  weighted  least  squares  cost  functional  is  JWLS (qWLS)  =  8.8394  x  106.  Thus 
the  model  comparison  statistic  is  uWLS  ss  711.3  (there  are  4293  data  points).  We  conclude  then,  that  there 
is  overwhelming  evidence  to  reject  the  null  hypothesis  in  favor  of  the  alternative  hypothesis.  In  the  present 
application,  this  strongly  suggests  that  the  fraction  of  cells  which  remain  idle  over  the  course  of  the  experiment  is 
a  significant  mathematical  feature  which  should  be  included  in  an  accurate  model  of  proliferation  assay  data.  As 
with  the  cat  brain  diffusion-convection  model  discussed  in  the  introduction,  hypothesis  testing  has  again  allowed 
us  to  better  justify  the  use  of  key  model  components. 

This  example  illuminates  some  desirable  features  of  the  model  comparison  statistic.  The  statistic  is  very  easy 
to  calculate  from  the  minimized  least  squares  cost  functionals  and  provides  a  theoretically  sound  basis  on  which  to 
test  a  hypothesis.  We  also  see  that  the  test  statistic  can  provide  meaningful  comparisons  between  mathematical 
models  that  are  not  easily  distinguishable  from  the  fits  of  those  models  to  data  (Figure  2)  or  from  the  ordinary 
least  squares  cost  functions. 
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6  Concluding  Remarks 

In  this  note  we  have  outlined  how  hypothesis  testing  can  be  used  as  a  tool  for  model  comparison  for  certain  classes 
of  models.  Beginning  with  an  ordinary  least  squares  framework,  we  reviewed  assumptions  and  theorems  which 
form  the  basis  for  a  rigorous  quantitative  theory  of  model  comparison  based  upon  the  statistical  properties  of 
least  squares  estimators.  Two  similar  approaches,  one  from  [2]  and  one  from  [19],  have  been  considered  and  the 
relative  merits  of  the  different  assumptions  involved  have  been  reviewed.  The  former  approach  is  more  general 
in  the  sense  that  one  does  not  need  to  find  a  dominating  function  b(£,  x)  in  order  to  apply  the  theoretical  results 
presented.  Yet  the  tradeoff  is  that  one  must  place  a  stronger  (and  much  harder  to  verify  in  practice)  assumption 
on  the  hessian  matrix  J  when  compared  to  the  latter  approach.  It  appears  that  the  most  advantageous  approach 
will  depend  largely  upon  the  specific  context  in  which  the  theory  is  used. 

It  has  also  been  shown  that  the  techniques  for  model  comparison  by  hypothesis  testing  can  be  extended 
from  ordinary  least  squares  to  weighted  least  squares  through  a  change  of  variables  designed  to  account  for  a 
heteroscedastic  error  model.  The  result  is  significant  as  it  extends  the  utility  of  the  model  comparison  test  to 
include  situations  in  which  the  errors  characterizing  the  observation  process  are  independent  but  not  identically 
distributed.  As  noted  above,  the  same  technique  can  be  used  to  extend  the  result  to  account  for  non-independent 
errors  as  well,  provided  one  has  some  additional  information  regarding  the  correlation  between  observations. 

It  is  reassuring  that  the  primary  result  concerning  the  model  comparison  statistic  (either  U°LS  or  U™LS) 
is  essentially  unchanged-one  simply  computes  the  relative  difference  between  the  cost  functional  minimized  over 
the  admissible  parameter  space  and  the  cost  functional  minimized  over  the  linearly  restricted  parameter  space. 
In  effect,  one  must  only  take  care  to  properly  account  for  the  variance  of  the  observation  errors  in  the  formulation 
of  the  cost  function,  and  the  resulting  model  statistic  will  be  Chi-square  distributed.  The  test  is  similar  in  form 
to  ANOVA-type  statistical  tests  [2]  and  depends  only  on  the  residual  sums  of  squares  (RSS)  for  the  restricted  and 
unrestricted  models.  This  is  convenient  as  the  test  is  independent  of  model  parameterization,  and  the  RSS  can  be 
returned  as  a  byproduct  of  optimization  by  most  software  packages.  Because  the  test  is  designed  for  least  squares 
type  estimation,  only  the  first  two  central  moments  of  the  statistical  model  need  to  be  specified,  as  opposed  to  a 
conditional  density  function  in  maximum  likelihood  estimation  procedures.  The  major  computational  drawbacks 
of  the  method  are  that  one  must  compute  two  separate  inverse  problems  (as  opposed  to  use  of  the  Wald  test 
statistic  [19])  and  one  can  only  compare  nested  models  pairwise  (as  opposed  to  use  of  Akaike’s  Information 
Criterion  [14]). 

The  unspecified  form  of  the  function  w(x)  is  meant  to  indicate  its  wide  applicability.  An  important  subclass 
of  weighted  least  squares  problems,  sometimes  termed  generalized  least  squares  [13,  Cli.  3]  occurs  when  the 
magnitude  of  the  observation  error  is  assumed  to  be  directly  proportional  to  the  magnitude  of  the  true  model  value. 
In  that  case,  w(x)  =  f(x;q*).  As  seen  in  the  example  of  Section  5,  this  results  in  an  interesting  computational 
problem.  In  the  theory  above  it  is  tacitly  assumed  that  the  weights  w(xk)  are  known  exactly.  However,  one  does 
not  know  the  true  parameter  value  q*.  While  computational  strategies  exist  to  handle  such  situations  [13,  17],  the 
effects  of  this  additional  level  of  approximation  are  not  considered  in  the  theory  presented  here,  and  it  is  assumed 
that  the  weights,  even  when  computed  iteratively,  are  known  exactly.  More  generally,  the  weight  functions  may 
depend  upon  some  additional  nuisance  parameters,  w(x)  =  w(x;r )  (in  the  example  of  Section  5,  r  =  ({Aj},7)) 
which  must  be  estimated  using  a  multi-stage  procedure  [17,  19]. 

There  are  some  open  problems  in  extending  this  theory  further.  For  instance,  assumption  (A2)  effectively 
limits  the  application  of  the  results  presented  to  models  in  which  the  underlying  parameter  space  is  a  subset 
of  finite-dimensional  Euclidean  space.  Yet  in  some  situations,  for  instance  the  nonparametric  estimation  of 
probability  measures  [7],  the  ‘parameters’  to  be  estimated  are  contained  within  an  infinite-dimensional  function 
space.  While  computational  methodologies  have  been  developed  for  such  situations,  the  statistical  properties  of 
the  least  squares  estimators  have  not.  Thus  one  cannot  rely  upon  any  asymptotic  results  for  the  quantification 
of  uncertainty,  nor  can  one  test  hypotheses  in  the  manner  demonstrated  above. 

It  should  be  clear  by  this  point  that  model  comparison  tests  have  immense  potential  for  use  as  a  key  tool 
when  working  with  biological  modeling  applications.  The  results  presented  here  and  the  references  herein  provide 
a  rigorous  foundation  for  careful  use  of  these  tests  in  practice.  The  wide  applicability  of  the  results  presented 
here  will  continue  to  find  use  in  biological  modeling  projects,  as  well  as  provide  motivation  for  further  theoretical 
study  and  expansion  of  model  comparison  testing  ideas. 
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